-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Bug in iloc aligned objects #37728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in iloc aligned objects #37728
Conversation
� Conflicts: � pandas/tests/indexing/test_iloc.py
pandas/core/indexing.py
Outdated
if name == "iloc": | ||
for index, loc in zip(range(len(ilocs)), ilocs): | ||
val = value.iloc[:, index] | ||
self._setitem_single_column(loc, val, plane_indexer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be solved with an appropriate call to _align_frame or _align_series? id really prefer to avoid passing name
around. by the time we get to setitem_with_indexer we should always be working with positional indexers (i.e. iloc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm no not to my knowledge. The issue is, that we want to achieve oposing things between loc and iloc, but the objects do look the same. setitem_with_indexer
could be solved with a default value, that is not a problem.
But I can not see a way to do this without passing some kind of flag to the align calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked into this a bit again. The indexer in setitem_with_indexer
already is positional, but the value may not be properly aligned for loc, hence we call the align functions there. This is not necessary for the iloc path, but this information is lost there, if we do not pass it through somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it sounds like we should be doing some appropriate alignment within Loc before passing it into iLoc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be an option, but would result in a bigger rewrite and I am not sure if that would introduce more complexity, because the align_series
method is called in various places. Should I give it a try?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like this PR is making it so that we never call align_frame/align_series if setitem_with_indexer was called from iloc, only if it was called from loc. setitem_with_indexer is only called from one place within loc. Can we just do that alignment right before that call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, we could try, but it is not that simple. For example before calling setitem_with_indexer
from loc, value
may be a DataFrame
with different dtypes. Currently _align_series
is called for every column to get a ndarray
with the correct dtype. If we call this before calling setitem_with_indexer
we would have to call _align_frame
or copy the logic, which decides if we align every colum at once or separately. _align_frame
returns an object dtype array, which obviously would introduce a lot of dtype errors. That is what I meant with a bigger rewrite here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for explaining, i think i have a better handle on it now.
ive got an upcoming PR to make all 2D cases go through split_path. let's revisit this after that goes through and see if we can make it cleaner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds great. Just ping me when merged, then I will have a look.
� Conflicts: � doc/source/whatsnew/v1.2.0.rst
# GH: 22046 | ||
df1 = DataFrame({"a2": [11, 12, 13], "b2": [14, 15, 16]}) | ||
df2 = DataFrame({"a": [1, 2, 3], "b": [4, 5, 6], "c": [7, 8, 9]}) | ||
df2.iloc[:, [1]] = df1.iloc[:, [0]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there other indexer/value type combinations that are relevant here? e.g. the value being set here is a DataFrame, would a Series trigger the same bug? what if instead of [1]
the indexer was slice(1, 2)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
df1 = DataFrame({"a2": [11, 12, 13], "b2": [14, 15, 16]})
df2 = DataFrame({"a": [1, 2, 3], "b": [4, 5, 6], "c": [7, 8, 9]})
df2.iloc[:, [1]] = df1.iloc[:, 0]
raises
Traceback (most recent call last):
File "/home/developer/.config/JetBrains/PyCharm2020.2/scratches/scratch_4.py", line 132, in <module>
df2.iloc[:, [1]] = df1.iloc[:, 0]
File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 684, in __setitem__
iloc._setitem_with_indexer(indexer, value)
File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 1637, in _setitem_with_indexer
self._setitem_single_block(indexer, value)
File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 1851, in _setitem_single_block
self.obj._mgr = self.obj._mgr.setitem(indexer=indexer, value=value)
File "/home/developer/PycharmProjects/pandas/pandas/core/internals/managers.py", line 562, in setitem
return self.apply("setitem", indexer=indexer, value=value)
File "/home/developer/PycharmProjects/pandas/pandas/core/internals/managers.py", line 427, in apply
applied = getattr(b, f)(**kwargs)
File "/home/developer/PycharmProjects/pandas/pandas/core/internals/blocks.py", line 1005, in setitem
values[indexer] = value
ValueError: shape mismatch: value array of shape (3,) could not be broadcast to indexing result of shape (1,3)
Process finished with exit code 1
Did not check slice previously, this triggered the bug too. Will add test
pandas/core/indexing.py
Outdated
@@ -1816,13 +1821,13 @@ def _setitem_single_block(self, indexer, value): | |||
|
|||
indexer = maybe_convert_ix(*indexer) | |||
|
|||
if isinstance(value, (ABCSeries, dict)): | |||
if isinstance(value, (ABCSeries, dict)) and name != "iloc": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we have a dict here dont we still want to wrap it in a Series?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, have no tests getting here with iloc. Will add one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LMK when these tests are all ready and ill take another look. this one is almost ready
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests are all in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. minor comments can be a followup. @jbrockmendel merge when good.
pandas/core/indexing.py
Outdated
@@ -1714,7 +1714,7 @@ def _setitem_with_indexer_2d_value(self, indexer, value): | |||
# setting with a list, re-coerces | |||
self._setitem_single_column(loc, value[:, i].tolist(), pi) | |||
|
|||
def _setitem_with_indexer_frame_value(self, indexer, value: "DataFrame"): | |||
def _setitem_with_indexer_frame_value(self, indexer, value: "DataFrame", name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you type name here (and above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pandas/core/indexing.py
Outdated
@@ -1787,7 +1792,7 @@ def _setitem_single_column(self, loc: int, value, plane_indexer): | |||
# reset the sliced object if unique | |||
self.obj._iset_item(loc, ser) | |||
|
|||
def _setitem_single_block(self, indexer, value): | |||
def _setitem_single_block(self, indexer, value, name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pandas/core/indexing.py
Outdated
@@ -1854,7 +1858,7 @@ def _setitem_with_indexer_missing(self, indexer, value): | |||
if index.is_unique: | |||
new_indexer = index.get_indexer([new_index[-1]]) | |||
if (new_indexer != -1).any(): | |||
return self._setitem_with_indexer(new_indexer, value) | |||
return self._setitem_with_indexer(new_indexer, value, name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double check me on this, but i think it might be the case that we only get here with name = "loc"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, you are right. Iloc raises when indexer is a dict. That is the only way we can get in there. Removed it
thanks @phofl nice! |
expected = DataFrame({"a": [1, 2, 3], "b": [11, 12, 13], "c": [7, 8, 9]}) | ||
tm.assert_frame_equal(df2, expected) | ||
|
||
def test_setitem_iloc_dictionary_value(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick for future: setitem_iloc -> iloc_setitem
agreed thanks @phofl, this wasnt easy, glad to see it fixed |
@@ -1815,14 +1821,13 @@ def _setitem_single_block(self, indexer, value): | |||
return | |||
|
|||
indexer = maybe_convert_ix(*indexer) | |||
|
|||
if isinstance(value, (ABCSeries, dict)): | |||
if isinstance(value, ABCSeries) and name != "iloc" or isinstance(value, dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for follow-up, can you put parens where appropriate so i dont have to think about whether the "and" gets evaluated before the "or"
@@ -1854,7 +1859,7 @@ def _setitem_with_indexer_missing(self, indexer, value): | |||
if index.is_unique: | |||
new_indexer = index.get_indexer([new_index[-1]]) | |||
if (new_indexer != -1).any(): | |||
return self._setitem_with_indexer(new_indexer, value) | |||
return self._setitem_with_indexer(new_indexer, value, "loc") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment explaining that we only get here with name=="loc" and thats why its hardcoded (or assertion)
Thanks @jbrockmendel and @jreback |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
If iloc should be purely position based, this PR fixes the bug described in the issue. One test depends on the wrong behavior, so I had to adjust it.